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Abstract 

This paper focuses on the optimal sensor placement problem for the identification of pipe failure locations in large-scale 
urban water systems. The problem involves selecting the minimum number of sensors such that every pipe failure can be 
uniquely localized. This problem can be viewed as a minimum test cover (MTC) problem, which is NP-hard. We consider 
two approaches to obtain approximate solutions to this problem. In the first approach, we transform the MTC problem to 
a minimum set cover (MSC) problem and use the greedy algorithm that exploits the submodularity property of the MSC 
problem to compute the solution to the MTC problem. In the second approach, we develop a new augmented greedy algorithm 
for solving the MTC problem. This approach does not require the transformation of the MTC to MSC. Our augmented greedy 
algorithm provides in a significant computational improvement while guaranteeing the same approximation ratio as the first 
approach. We propose several metrics to evaluate the performance of the sensor placement designs. Finally, we present detailed 
computational experiments for a number of real water distribution networks. 

Key words: Fault identification; Minimum test cover; Water networks. 


1 Introduction 

Infrastructure deterioration, demand-supply uncer¬ 
tainty, and risk of disruptions pose new challenges in 
maintaining modern infrastructures. Resilient urban 
infrastructures including water distribution systems, 
transportation networks, and electric grids are crucial 
for societal well-being. Smart infrastructure operation 
driven by sensing and actuation technologies have been 
identified as one of the primary solutions towards re¬ 
silient urban systems [26,40]. Through a network of 
sensors, an individual fault or correlated failures in a 
system component can be detected and localized, and 
restorative actions can be executed in response to these 
faults. Whereas network observability for a given sens¬ 
ing capability has been widely studied in the context of 
fault detection, sensor placement for fault isolability, i.e. 
the ability to distinguish between faults, has not been a 
commonly studied problem, especially in the context of 
pipe bursts in water distribution networks. 

The goal of this work is to design a sensor placement 
configuration for identification of pipe failure locations by 
using the minimum number of sensors. The underlying 
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idea behind our approach is to ensure that the sensor 
placement results in a collective output that is unique for 
each failure event. Specifically, our main contributions 
are as follows, we: 

- Define the localization of pipe bursts as the design 
objective of a sensor network configuration, and using 
ideas from combinatorial optimization, we formulate 
the fault location identification problem as a minimum 
test cover (MTC) problem; 

- Develop a computationally efficient augmented greedy 
algorithm to solve the minimum test cover problem 
(resp. identification problem), which is significantly 
faster in comparison to the previous approaches and 
therefore, scalable to large-scale networks; and 

- Test and evaluate our sensor placement approach on a 
batch of real-networks of various sizes and parameters 
using practically relevant performance measures. 

Our paper is motivated by the need to consider local¬ 
ization of pipe bursts in the deployment phase of new 
sensing technologies, since this consideration can sig¬ 
nificantly reduce the response time and overall costs of 
fault localization to the distribution utilities. We base 
our work on the use of low-cost, high-rate online sensors 
measuring water pressure for remote detection of pipe 
burst using data mining techniques. Real-world exam¬ 
ples are the PIPENET in Boston, MA, US [34] and the 
WaterWise in Singapore [4]. The sensor placement prob- 
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lem is not unique to the water sector and can be found 
in many engineering applications for system operation. 
We discuss some of the related work in Section 7. 

In Section 2, we present the network and the sensing 
models and formulate the detection and identification 
problems as the minimum set cover (MSC) and mini¬ 
mum test cover (MTC) problems, respectively. A key 
aspect of the MTC problem formulation is the choice of 
the objective function, which is to select the minimum 
number of tests from a collection of tests such that ev¬ 
ery event can be uniquely classified in one of the given 
categories based on selected tests’ outcomes [22]. In our 
setup, the set of outcomes of tests comprise of the output 
vector from sensors, events are pipe failures, and classi¬ 
fication categories are the possible locations of the failed 
pipes. In Section 3, we present a solution approach as 
in [14,35], in which the MTC is first transformed to the 
MSC and then solved using the greedy approximation 
[ 20 ]. 

In Section 4 we present an augmented greedy algorithm 
for solving the MTC that does not require the com¬ 
plete transformation of the MTC to the equivalent MSC, 
and directly computes the objective function in a greedy 
fashion. This algorithm is much faster than the standard 
greedy approach and considerably improves the scala¬ 
bility of our approach. In Sections 5 and 6, we demon¬ 
strate our approach using a benchmark and a batch of 
twelve real water distribution networks of various sizes 
and specifications. We suggest four metrics to evaluate 
the performance of the design including detection, iden¬ 
tification, and localization scores. Although we demon¬ 
strate our results in the context of water networks, our 
algorithm provides an improved solution to the generic 
test cover problem. Section 8 summarizes our work and 
proposes future extensions. 

2 Problem formulation 

Consider the problem of placing online sensors measur¬ 
ing hydraulic pressures with high frequency such that 
the identification of pipe failure locations is maximized. 
Based on the number of pipes where link failures (i.e., 
pipe bursts) can happen, we consider n link failures as 
a set of failure events, denoted by £ = {£i,... ,in}- 

For the ease of presentation and without the loss of gen¬ 
erality, let £j denote the failure event at the pipe. 
Moreover, we define a set of sensors that can be placed 
at TO nodes of the network as 5 = {S'!,..., 5^}. Here, 
Si denotes the location of the sensor. The outputs 
from sensors, which are based on the change in pressure 
induced by the failure event, are denoted by yg. 

2.1 Network dynamics and sensing model 

A water distribution network can be represented by a 
graph comprising nodes (supply and demand) connected 
by links (pipes, valves, and pumps). Physical failures 
of the infrastructure, such as pipe bursts, cause a dis¬ 
turbance in the flow, which moves through the system 


as a pressure wave known as water hammer, or surge 
with very high velocity, varying typically in the range of 
600 — 1500]™] [21]. This implies that the steady state 
analysis employed by traditional methods such as super¬ 
visory control and data acquisition (SCADA) systems 
are inadequate and that the transient system dynamics 
between the initial and the final steady state conditions 
need to be considered. 

The transient system state can be typically described 
by mass and momentum partial differential equations 
[38]. The method of characteristics (MOC) is a numer¬ 
ical technique typically used to approximate the solu¬ 
tion of the hydraulic transients. The MOC transforms 
the partial differential equations into ordinary differen¬ 
tial equations that evolve along specific characteristic 
lines of the numerical grid, which are solved explicitly to 
compute the head and flow, hi^t+i, Qi,t+i, at new point in 
time and space. Here, t and i indicate the discrete points 
of the numerical grid. For a given pipe, the two charac¬ 
teristic equations describing the hydraulic transients are 
formulated as [21]: 

hi^t+i = 2 + b — Qi+i,t) 

+ f {Qi+i,t\qi+i,t\ — (Zi-i,iki-i,t|) ] (1) 

<71,4+1 = [^+4+1 — hi+l^t + <7i+1.4 — 'c|<7i+l,4|] , (2) 

where r is the resistance coefficient associated with the 
steady state, and b is the impedance coefficient associ¬ 
ated with the transient state. For 6 = 0 the set of equa¬ 
tions (17),(18) is reduced to the steady state, where the 
head loss along a pipe occurs only due to friction [36]. Ad¬ 
ditional information describing transient dynamics can 
be found in the supporting information (SI) [27]. 

The effect of a pipe burst at location i can be translated 
into boundary conditions using the orifice head-flow re¬ 
lation [38]. Before the burst occurs, the cross-section 
area of the orifice is equal to zero and it increases dur¬ 
ing a burst, hence we can expect a sudden change in 
the hydraulic head. The relationship between the head 
and the pressure, measured by the sensors at location 
i, is related to the elevation of the sensor location. If Zi 
is the elevation, and pi^t is the pressure at location i at 
any given time t, then pi t = {hi^t — Zi) pg, where g is 
the gravitational acceleration \ j^] and p is water den¬ 
sity [^]. Hence, the disturbance caused by a pipe burst 
that reaches the sensor location can be detected by sens¬ 
ing the hydraulic pressure. Similar approaches have been 
suggested in [39]. 

The disturbance caused by the pipe burst quickly dissi¬ 
pates with the distance between the burst event £j and 
the location of the sensor Si. For the purpose of sensor 
placement, we are interested in obtaining the sensor’s 
output as a result of some event tj. Let ySi{t,£j) € {0,1} 
be a discrete state (output) of the sensor Si at time t, 
where 1 represents a possible detected event and 0 rep- 
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resents otherwise. Let ^ be a function characterizing the 
distance between the expected pressure (i.e., when there 
is no pipe burst), denoted by j, and the measured pres¬ 
sure, denoted by pi^t- The sensor output can then be for¬ 
mulated as: 


ysAt^j) 


1 if ^ {pi^t - Pi,t) > £ 
0 otherwise 


(3) 



Fig. 1. Illustrative example layout 


where e is a threshold value. A simple detection model 
would be where the sensor Si indicates an event if the 
change in the pressure is above some threshold value e. 
We note here that when the failure event occurs during 
a given time period, then the output of Si will be 1 (or 
0) independent of the time of the event ij. Hence, we 
can neglect the time dependency of the sensor output to 
detect the event and can restate the output of the sensor 
as: 




1 if = 1, for any t > 0 

0 otherwise 


(4) 


Let ys{ij) = [ysi(^j),- • • be the fault signa¬ 

ture [6] of the failure event represented by a boolean 
vector of the outputs of sensors in the set S. 

Consequently, for a sensor set S and the set of events 
C, we can instantiate a boolean matrix of dimensions 
\C\ X |5| called the influence matrix and denoted by 
M.. The row of M. consists of sensors’ outputs in 
response to the event i.e., ysi^j)- Similarly, = 1 
indicates that a sensor Si detected the failure at link £j , 
and Aiij =0 means otherwise. Each row of the influence 
matrix j\4 is analogous to the notion of fault signature in 
the model-based fault diagnosis systems literature [6]. 


M(£,S) 


ys(^i) 

ys(4) 


ys(0 


(5) 


Furthermore, for the set of link failures £, and the set 
of all possible sensor locations S, let Ci C £ he the 
set of link failure events detected by the sensor Si, i.e., 
Ci = {ij G C\ ysii^j) = !}• If C is a collection of all 
such Ci’s, i.e., C = {Ci : Vi}, then for a given subset 
of sensors S C S, we define Cs C C as a set of subsets 
of failure events, where a subset corresponds to a sensor 
in S that detects the failure events in that subset, i.e., 
Cs = {Q : 5, e S}. 

Example 1 (Sensing model) To illustrate the net¬ 
work dynamics, consider a small network having 8 nodes 
connected hy 10 links as shown the Figure 10. A pipe 
burst event is simulated in the middle of pipe l\ and 
system response at network nodes is recorded. For the 
ease of notations, we designate the failure events as 
pipes’ ids, ij. The transient simulations were computed 
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Fig. 2. Failure event generated in pipe £i in the small example 
- pressure head [m] and outputs of sensors 82 , 84 . 

using the HAMMER software [1]. Figure 2 shows simu¬ 
lated pressure heads and boolean outputs ys, for sensors 
located at nodes 2 and 4- Thus for S = {5'2j5'4} the 
sensors’ state is ysiii) = [I 1 O]. If sensors are placed 
at all nodes of the network, then the sensors’ state in 
the case of failure at £i is ys{ii) = [1; Ij 1; 0,1,0,0, 0], 
ys{i 2 ) = [1,1,1,1,0,1,0, 0], and so on. The correspond¬ 
ing influence matrix is 

Si S 2 S 3 S 4 Ss Se Sr Sg 
IIIOIOOOX 
11110 10 0 
11011001 
10 111110 

10 110 110 

01111011 ' 

0 0 111111 

01011011 
0 0 110 111 

Voooiiiii/ 

Next, we formulate the detection and identification prob¬ 
lems as the minimum set and test cover problems, re¬ 
spectively. 

2.2 Detection as MSC 

For the set of events £ and the set of sensors S, we define 
a detectable event as the one for which there exists at 
least one sensor in S that detects the event. The detection 
problem is to select the minimum number of sensors S C 
S, such that when a detectable event occurs, at least 
one sensor in S detects the event. For a given subset of 
sensors S, we define the detection function, denoted by 
/d, as follows: 


M{C,S) = 
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foiCs) 


U c'- 

c,eCs 


( 6 ) 


The detection function in ( 6 ) gives the number of link 
failures in C that can be detected by the sensors in S. The 
detection problem is to select a subset of sensors S C S 
with the minimum cardinality such that all detectable 
events are detected, i.e. foiCs) = foiCs)- The detection 
performance of a subset of sensors S is defined as the 
normalized detection score, Id{S) and is computed as 
/d(Cs)/|/ 1|. The detection problem is equivalent to the 
minimum set cover problem, which could be defined as: 

Definition 2.1 (Minimum set cover (MSC)) Let C he 
a finite set of elements, and C = {Ci : Ci C £} be the 
collection of given subsets of C. The minimum set cover 
is to find Cg ^ C with the minimum cardinality such that 

U U Q- 

Ci^e CjGCs 

In the above definition, if C is the set of link failures 
and C is the collection of Cfs corresponding to all the 
available sensors, then a set cover of minimum size Cg, 
gives the minimum number and locations of sensors that 
solve the detection problem. Thus, we get the following: 

Proposition 2.1 The problem of detection of link fail¬ 
ures in a network is equivalent to the minimum set cover 
problem, and a solution to MSC is therefore, a solution 
to the detection problem. 

The MSC problem is closely related to the maximum 
coverage problem [37], which emerges when the number 
of sensors that could be used is limited, i.e., [S'! < B. 
The objective of the maximum coverage problem is to 
select the sensors such that the number of detectable 
events is maximized and the constraint [S'] < B is sat¬ 
isfied. In Section 3.1 we discuss the greedy solution ap¬ 
proach, which is very much similar for the MSC and the 
maximum coverage problems. 

2..3 Identification as MTC 

For the identification of link failures, the goal is to 
uniquely detect the events in C, i.e. to distinguish be¬ 
tween events using the outputs of sensors. We note that 
event £i G C can be distinguished from event £j G C, if 
there exists a sensor in S that gives different outputs 
for £i and £j. In such a case, we say that the pair-wise 
event £i,£j is detectable if dS'p G S : ySpi£i) ySp{£j)- 
In terms of the influence matrix of the network, if a 
pair-wise event £i,£j is detectable, then there exists a 
column with different i and j row entries. It follows 
that an event £i can be uniquely detected if all pair-wise 
events £i,£j, Vj i are detectable. 

The identification problem is now defined as follows: for 
a given C and S , the identification problem is to select a 
subset of sensors S G_ S with the minimum cardinality. 


such that every detectable pair-wise event can he detected 
by at least one sensor in S. The identification function 
of S, fi{Cs), is the number of pair-wise events that are 
detected by a subset of sensors S C S, and will be further 
discussed in Section 3.2.1. The identification problem is 
equivalent to the minimum test cover problem, which is 
defined as follows [7]: 

Definition 2.2 (Minimum test cover (MTC)) Consider 
a finite set C and a collection of subsets C = {Ci : Ci C 
C{. The minimum test cover is to find Ct C C with the 
minimum cardinality such that if for a pair of elements 
{£m£v} G L, there exists Ci G C that contains either £u 
or £y but not both, then there exists some Cj G Ct that 
also contains either £u or £y, but not both. 

The identification problem is to find a subset Ct C C of 
minimum cardinality, or equivalently the corresponding 
subset of sensors S C S, such that if ys{£j) is unique 
with respect to the set of all sensors S, then ys{£j) is 
also unique with respect to a subset of sensors S, which 
is the MTC problem defined above. Thus, we can state: 

Proposition 2.2 The problem of identification of link 
failures in networks is equivalent to the minimum test 
cover problem, and therefore, a solution to MTC is also 
a solution to the identification problem. 

Example 2 (Detection vs. Identification) Follow¬ 
ing example 1 , consider two sensors placed at nodes 2 
and 4, S = {S' 2 , 5 ' 4 }. For the detection problem, we note 
that C 2 U (74 = C. That is, at least one of the sensors in 
S has an output 1 whenever a link fails. Thus, sensors 
S 2 and S 4 cover (detect) all link failures and solve the 
detection problem. For the identification problem, sen¬ 
sors 2 and 4 are not sufficient as they generate only three 
unique states associated with the 10 events, which makes 
it impossible to distinguish between all link failures. For 
example, the state { 1 , 0 } is uniquely associated with a 
failure in link £ 1 , whereas, the state { 1 , 1 } can be asso¬ 
ciated with a failure in any of the links t' 2 ,^ 3 ,^ 6 : or £s. 
However, for the set of sensors S* = (S'!, 5 ' 2 , S' 3 , S's}, 
which solves the MTC problem for example 1, the output 
is unique for each link failure, i.e. ten distinct indicator 
vectors, each corresponding to a unique failure event, 
are obtained. 


3 Greedy MTC solution 

It is well known that both MSC and MTC are NP-hard 
problems [13,37]. In this section, we first introduce an 
approximate solution to the MSC, which will be utilized 
in Section 4 for constructing a computationally efficient 
solution of the MTC problem. 
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3.1 Detection solution 

MSC has been studied extensively owing to its wide va¬ 
riety of applications in theoretical as well as practical 



domains. A straight-forward way to solve the MSC is 
by the greedy approach. The greedy approach is to se¬ 
lect, in each iteration, a sensor that detects the maxi¬ 
mum number of undetected link failures, until all link 
failures are detected, or no further link failure can be 
detected by any sensor. In the maximum coverage prob¬ 
lem, iterations continue until a given number of sensors 
are selected. If n is the total number of link failures, m 
is the total number of sensors, then greedy algorithm for 
the MSC gives the best approximation ratio of ©(Inn) 
[13,19]. In fact, if k is the maximum number of link fail¬ 
ures that can be detected by any sensor, then the greedy 
algorithm has an approximation ratio of ©(In A:), which 
is the best possible (unless P=NP) [37]. In our context, k 
depends on the network topology and the sensing model 
as in (5). Similarly, for the maximum coverage problem, 
the greedy algorithm gives the approximation ratio of 
(1 — 1 /e), which is again the best possible. 

Although the greedy approach gives the best known ap¬ 
proximation ratio, its straightforward implementation 
requires a large number of function (as in ( 6 )) evalua¬ 
tions. The running time of greedy approach is a func¬ 
tion of the number of sensors and events, 0{mn). For 
large scale systems, in which n and m are very large, 
this simple greedy approach becomes computationally 
intractable owing to a large number of function evalu¬ 
ations, even if computing a function is not expensive. 
However, greedy algorithm can be made faster by reduc¬ 
ing the number of function evaluations if the submod¬ 
ularity property is satisfied [20]. Submodular functions 
can be defined as follows: 

Definition 3.1 (Submodularity) LetC be a finite set and 
f be a set function, / : 2^ —> M. Moreover, Cg Q Cr Q C, 
and Ci G C\Cr, then f is submodular whenever 

f (Cs U {©J) - f{Cg) >f{CrU {a}) - f{Cr) (7) 

For the detection problem, this means that as the num¬ 
ber of link failures detected by the selected sensors in¬ 
creases, the marginal value of adding a sensor to the 
cover decreases. It can be shown that the function in ( 6 ) 
is submodular (see [27]), and the submodularity of fn 
can be exploited to obtain the lazy greedy algorithm as 
in [20]. The basic idea behind the lazy greedy approach 
is to eliminate the redundant computations in each iter¬ 
ation. This can be further explained as follows: For the 

iteration, let ©'^(©i) denotes the utility of adding a 
sensor i to the cover, i.e. foiCs U {Ci} — foiC-s), then 
by the submodularity of fo, we know that A«+i(©.) < 
FK.{Ci). Moreover, without the loss of generality, we as¬ 
sume that ©k(©i) > ©k(© 2 ) > ■ • ■, then ©i is the greedy 
choice in the iteration. However, in the next itera¬ 
tion, if (© 2 ) > F„(© 3 ),thenF„+i(© 2 ) > ©^-^(©i), 
Vj > 3, which means that there is no need to compute 
FK+i{Cj), Vj > 3. This saves a large number of poten¬ 
tial computations and improves scalability of the solu¬ 
tion approach to large scale systems. 


3.2 Identification solution 

One approach to solve the MTC problem is to first trans¬ 
form it to an equivalent MSC problem [7], and then 
to solve the MSC problem using lazy greedy algorithm, 
as explained earlier. The greedy approach to solve the 
MTC yields a (2 In n-f 1) approximation ratio algorithm, 
which is the best possible [22]. A solution of the equiv¬ 
alent MSC is a solution to the original MTC problem. 
Thus, a straight-forward way to solve the identification 
problem for link failures is to first obtain an equivalent 
detection problem, in which each event represents a pair¬ 
wise link failure, and then utilize the greedy approach to 
solve the corresponding detection problem. We call this 
the transformed lazy greedy (TLG) and will use it in Sec¬ 
tion 6.2 to demonstrate the simulation results. Next, we 
summarize the transformation of the MTC to the MSC 
problem as outlined in [7]. 

3.2.1 Transformation of MTC to MSC 
Given an instance of the MTC, i.e., C and C = {©*}, 
where Ci C £, we transform the MTC to the MSC by 
taking the following two steps: 

• Create a new set of events: £* = {£{ 2 , ■ ■ ■ 

For each unordered pair {£i,£j}, define a new element 
; £* consists of all such ©/s. 

• Create anew sets of sensors’outputs: C* = {Cl, ■ ,©„}, 
where ©* = {£C : \{£^,£J} C ©„| = 1}, Vfc G 
{1, • • • ,m}. In other words, ££ G ©/ if and only if 
exactly one of £i or £j is in ©„. Moreover, for a subset 

of sensors S' C 5, we define Cg = {©/ : S„ G S}. 

Hence, we obtain a new identification matrix ,S) 

of dimensions 11 x to, in which each row corresponds 

to a pair-wise link failure and each column represents 
sensor’s output. If a specific row in represents a pair 
£i,£j, then the column entry of the corresponding 
row in AI* is an exclusive OR of the (i, u)*^ and (j, u)*^ 
entries of the influence matrix Ai. The above point illus¬ 
trates the fact that to localize an event £i, there always 
exists a sensor that distinguishes £i from £j by produc¬ 
ing different outputs for £i and £j respectively, i.e., if a 
sensor output is 1 (resp. 0 ) in case of ©, then its output 
for £j is 0 (resp. 1 ), for all j i. 

Note that for a given subset of sensors S, the identifica¬ 
tion function, which is the number of pair-wise link fail¬ 
ures detected by S, is essentially same as the detection 
function of S in the corresponding MSC instance i.e., 

fiiCs) = foiCl), ( 8 ) 

where fo is defined as in ( 6 ). The normalized identifi¬ 
cation score, denoted by Ii{S), is computed by dividing 
// by the total number of pair-wise events, |£*|. 
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3.2.2 Greedy approach based solution 
Once the MTC problem has been transformed to the 
MSC problem, a straightforward way to obtain a solu¬ 
tion is to employ the greedy algorithm, as outlined in 
Algorithm 1. 


Algorithm 1 Minimum Test Cover - Greedy Algorithm 

1: Input: C = {Ci, ■ • • , Cm}, Ci C C 
2: Output: MTC: C* CC 
3: Initialize: C* 0 

4: Transform: the test cover instance to the set cover in¬ 
stance, i.e., from a given C and C, obtain a corresponding 
and G (Section 3.2.1). 

5: Solve: using greedy algorithm 

(a) Select Cl» € C* (i.e., the sensor i*) covering the most 
uncovered elements in £*. 

(b) Add to current set C* C* U {Ci*}. 

(c) Repeat until all elements in £* are covered, or no 
new element in can be covered by any C* G Ch 


As in the case of the MSC problem, the lazy greedy ap¬ 
proach, which exploits the submodularity property of 
the set cover problem, can be utilized. However, if there 
are n link failures that need to be localized, then the 

corresponding set cover instance contains ( ] events. 


and the time complexity of the greedy approach in Al¬ 


gorithm 1 is O ( m 


where m is the total number 


of sensors. Even for small-sized networks with a limited 
number of possible link failures, this approach becomes 
quite inefficient owing to a large number of computations 
required. Moreover, employing lazy greedy also achieves 
desired computational efficiency for realistic size of fail¬ 
ure event set. In the next section, we focus on improv¬ 
ing the computational time of the solution of the MTC 
problem. 


4 Augmented greedy MTC solution 

The main idea behind the augmented greedy approach 
is to achieve a computationally efficient approximation 
algorithm. We do so by avoiding the complete transfor¬ 
mation of the MTC to the MSC and directly evaluat¬ 
ing the function (8), thus eliminating the need to pre¬ 
compute the identification matrix AI‘(£*, 5). For exam¬ 
ple, for a network with m = 2000; n = 2000; we would 
require ~ 4 GB computer memory to store the trans¬ 
formed MSC. 


In each iteration of the greedy algorithm for the MTC 
solution, a sensor that covers (detects) the most pair¬ 
wise link failures from a total of ( ) pair-wise failures. 


is selected. Thus O 



comparisons are made in 


a single iteration for each potential sensor. In the aug¬ 
mented greedy approach, we avoid this by significantly 
reducing the number of comparisons made in each step. 


In fact, for each sensor, the number of comparisons made 

in a single iteration are always bounded by O 

where k is the maximum number of link failures that are 
detected by any sensor, and K is the number of sensors 
that are included in the test cover until that iteration. 
Since k is typically much smaller than n, a large number 
of computations are thus avoided in each iteration. 

To explain our approach, we first observe that a sensor 
i that detects k events (i.e., \Ci\ = k) can distinguish 
between k detected events and [n—k] undetected events. 
Thus, such a sensor detects k{n — k) pair-wise events 
(i.e., \Cl\ = k(n — k)). Unlike the detection problem, in 
which a sensor with a large k is desirable for the detection 
purposes, a sensor that detects a large number of failures 
is not always useful for the identification. Figure 3 shows 
the number of pair-wise events detected by a sensor as a 
function of the number of (single) events detected by the 
sensor. The maximum number of pair-wise events, which 
are link failures in our case, are detected when k = nj^.. 




Fig. 3. The number of pair-wise link detections as a function 
of the number of detected events. 

Moreover, if a sensor i included in a test cover and 
G Ci, then a distinction between the occurrence 
of and is not possible through the sensor i. Thus, 
a set of sensors that can distinguish between events 
^u,tv € Ci, or equivalently that can detect pair-wise 
events corresponding to the events in Ci, also need to be 
included in the test cover. Based on this observation, we 
suggest an augmented greedy approach to compute the 

test cover without computing the events priori. 

Let C* C C be the test cover until the current iteration, 
and Ccov be the set of link failures detected by the sensors 
that are included in the test cover, i.e., Ccov = U C'«- 

C„GC* 

Thus, the utility of adding Ci to C* (i.e., adding sensor 
Si to the test cover) in each iteration is based on the 
following two factors: 

(i) How many pair-wise link failures corresponding to 
the links which are not included in Ccov can be 
detected by C^? We define this value as Xt. 

(ii) How many pair-wise link failures corresponding to 
the links already included in Ccov can be detected 
by Cil We define this value as pi. 
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The overall utility of adding sensor Si to the test cover, 
denoted by Wi, is the sum of Xi and yi. A sensor Si* 
that maximizes this overall utility, let Wi* denote the 
maximum utility, will then be included in the test cover, 
and Ccov will be updated to Ccov ^ Ccov U Ci*. Now, we 
state how to compute Xi and yi in the iteration. 

(i) Computing Xi - If Uj is the number of link failures 
that are not yet included in Ccov, (he., Uj = n — 
|Ccow|)) and Ci contains kij of such link failures, 
then Xi = kij{nj — kij). Note that computing 
Xi is very straight forward and does not require 
computing pair-wise link failures from a given set 
of link failures. 

(ii) Computing - If a sensor u is already included in 
the test cover, then the pair-wise link failures cor¬ 
responding to the links in Cu remain undetected. 
Thus, yi computes how many of such pair-wise link 
failures can be detected by the inclusion of sensor 
i in the test cover. To make it precise, we proceed 
as follows: 

If X and Y are two sets, then we define: 

/3(X) = set of all 2-element subsets of X, 

and a{Y,l3{X)) = {a& PiX)-. \YCa\ = l}. 


Algorithm 2 Minimum Test 
Greedy Algorithm 

Cover - Augmented 

1 

Input: C = {Cl, - ■ ■ , Cm}, Ci C C 

2 

Output: MTC: C* C C 


3 

Initialization: Ccov = 0; C* 
n = }C ; Wi* = 1; 

II 

II 

O 

II 

4 

while Wi* > 0 do 


5 

nj ^ n Idcot)! 


6 

for all i do 


7 

Xi i (Ci \ Ccov} , ki^j 4 

|A,| 

8 

Xi ^ ki^j{nj ki.j} 


9 

Yi i Ci n Ccov 


10 

Vi Y Gt)\ 



t=0 


11 

Wi= Xi+ yi 



end for 


12 

Wi* max Wi 


13 

if Wi* > 0 then 


14 

C* ^C*\J{Ci*} 


15 

Ccov ^ Ccov U Ci* 


16 



17 

for t = 0 to J — 1 do 


18 

Gt^Gt\a(Yi*,Gt) 



end for 


19 




pnq if 

end while 



Here, a(Y,/3(A)) is a set consisting of such 2-element 
subsets of X that have exactly one common element 
with Y. For instance, if A = {1,2,3} and Y = {1,3}, 
then piyX) = {{1,2}, {1,3}, {2,3}}, and a{Y,j3{X)) = 
{{1,2},{2,3}}. 

To compute yi , first we compute the set of link failures 
common to Ci and Ccov and call it as Yi = Ci C Ccov 
Now, if sensor u is already included in the test cover, 
and G„ C /3(A„) is the set of undetected pair-wise link 
failures corresponding to the links in A„ C G„, then 

yi= ^ \a{Y„Gu)\ 

C'„ec* 

The complete algorithm is stated in Algorithm 2. 

Example 3 (Augmented greedy) Consider the net¬ 
work shown in Figure 10. Let ki he the number of fail¬ 
ure events detected by the sensor i, i.e., \Ci\ = ki, where 
Ci C S. In the first iteration (j = 1) of the while loop, 
size of the event space is n = 10, and kij = ki, Vi. 
Then, the number of new pair-wise link failures detected 
by the sensor i is given by Xi = kij(n — kij). Since 
there are no sensors in the test cover in the first itera¬ 
tion, yi = 0 for all the sensors. The maximum value of 
Wi is attained for the sensors 1 and 2 with wi = W 2 = 
Xi = X 2 = 5(10 — 5) = 25. We include sensor 1 in the 
test cover, thus C* = Ci after the first iteration of the 
while loop. The set of all undetected pair-wise events for 
sensor 1, Gi = {{1, 2}, {1,3}, • • • , {4, 5}}, are then up¬ 
dated. Finally, we update the number of covered events as 
Ccov = {1, 2, 3,4, 5}. For the second iteration, i.e., j = 2, 
size of the event space is updated as n 2 = 5. A complete 


account of the states of variables of the algorithm for the 
example is provided in the [27]. The algorithm returns the 
test cover consisting of sensors {1,2, 3,5} that uniquely 
identify all link failures. 


The augmented greedy approach in Algorithm 2 pro¬ 
duces the same solution as the greedy approach in Algo¬ 
rithm 1. Thus, Algorithm 2 has the same approximation 
ratio as the standard greedy algorithm, which has been 
proven to be the best possible. 


Since a large number of computations are avoided in the 
execution of Algorithm 2, it is more efficient than the 

simple greedy. In contrast to the O ((f)) comparisons 


performed in each iteration for a sensor in Algorithm 1, 



comparisons are done in each iteration of 


the Algorithm 2. Here, n is the total number of link 
failures, ki is the number of link failures detected by 
the sensor i (i.e., ki = \Ci\), and mj is the number of 
sensors included in the test cover until that iteration. 
Thus, if fc = max(fcj), then Algorithm 2 is at least n/k 
times faster than the simple greedy approach as shown 
below. Moreover, typically k << n in the case of link 
failure detection in water distribution networks, thus, 
n/k factor turns out to be a significant improvement. 


Proposition 4.1 Let 'Y/ki = n, and k = inax{ki), then 

i 



( 9 ) 
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Proof 


E 



^ (e - E ^ ^ (^E 





We note that Algorithm 2 is somewhat similar to the 
two-step greedy algorithm presented in [7]. However, in 
our approach, both Xi and yi are computed in the same 
iteration resulting in a more efficient implementation. 

5 Application to a benchmark network 

We first test our approach on a medium-size water net¬ 
work. Netl is a benchmark system that has been exten¬ 
sively studied in the context of sensor placement for wa¬ 
ter quality [25]. The system consists of 126 nodes, 168 
pipes, one reservoir, one pump, and two storage tanks 
and its layout is shown in Figure 4. The system supplies 
a daily demand of 5.15 x /day] and has a total 

pipe length of 37.5 x 10 ^[to]. 

For all our simulations, we consider a single failure event 
occurring at the center of each pipe and enumerate all 
possible failure events. For the detection problem, when 
fully calibrated transient model of the network is not 
available, we approximate the disturbance propagation 
using a simple distance based model emulating the dis¬ 
sipation of the pressure wave with the distance from the 
origin. As in [9], our influence model is based on the 
shortest distance threshold model, assuming that the 
disturbance in pressure can be sensed within a specified 
distance from the location of the burst, i.e., ysti^j) = 
{1 I d{Si,£j) < e}, where d is the length of the short¬ 
est path between two locations Si and £j, and e is some 
threshold. Figure 4 shows an example of the influence 
range (in red) of a burst in LINK-126 of the network for 
a threshold distance of £ = 1000 [to], i.e., a sensor located 
in the red region can detect the pipe failure. 



Fig. 4. Layout of Netl and propagation of failure in LINK-126 

Assuming that a sensor can be placed at any of the 126 
network nodes and any of the 168 network pipes can 
fail, we solve the MTC problem, as described previously 
in sections 2.3, 3.2, and 4. Figure 5 shows the normal¬ 
ized identification score, //, defined in Section 3.2.1, as 
a function of the number of sensors using the greedy 
approach. As noted in Section 3.1, we observe that the 


Fig. 5. Identification score for Netl 
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Fig. 6. Localization performance for Netl 

identification score function exhibits a diminishing re¬ 
turn property. The maximum identification score of 0.99 
is attained with 48 sensors. 


Observing that the identification score of the network is 
not sufficient to evaluate the quality of the design, since 
it does not indicate about the number of events that 
are uniquely identified and, respectively, the number of 
events that are not uniquely identified. For this reason, 
we suggest two complementary metrics for evaluating 
the performance of the sensor network design: 

Localization score — Let L C C he & subset of all such 
link failures for which the outputs of sensors in S is same, 
i.e., ys{£i) = ys{£j), = £j S L. We call such a sub¬ 

set of link failures L as a localization set. A localization 
can be associated with every unique vector of sensors’ 
outputs. Localization score is the total umber of local¬ 
ization sets obtained under the sensor configuration S. 
We note that it is not possible to distinguish between 
the failure events in a localization set by merely observ¬ 
ing the outputs of sensors. We define the normalized lo¬ 
calization score, Il{S), as the ratio of the total number 
of localization sets formed under the sensor configura¬ 
tion S to the total number of event failures. Ideally, the 
normalized localization score should be equal to 1, indi¬ 
cating that each fault can be uniquely identified. 

Localization size - is the number of faults associated with 
a unique output of sensors, or the number of elements in 
a localization set L. A localization size of higher value 
means that it would be difficult to identify the location of 
the fault, and additional local inspection methods might 
be needed. We define the worst set size, Iw{S) as the 
largest localization set. For complete localization it is 
required that, IwiS) = 1, indicating that all faults could 
be distinguished from each other, and therefore could be 
uniquely detected. 















Example 4 (Localization score) Continuing Exam¬ 
ple 2 for the two-sensor design S = { 5 ' 2 , 5 ' 4 }, three 
localization sets are formed, i.e. Li = {^i},L 2 = 
{4,4,^7, 4 ,^ 10},-^3 = {4,4,4,4}- The correspond¬ 
ing localization sizes are |Li| = 1, IL 2 I = 5, IL 3 I = 4. 
The normalized localization score is thus 4 = 3/10 
and the worst localization size is Iw = It means 
that if an event is detected, its distinction between 
three distinct groups is possible, but further distinction 
within the groups is not possible, with the largest indis¬ 
tinctive group of 5 links. With the four-sensor design, 
S* = {Si, S 2 , S 3 , S^}, the optimal normalized localiza¬ 
tion score and the maximum localization size of 1 are 
achieved, and we observe ten unique outputs of sensors, 
each associated with a unique failure event. 

Figure 6 a shows the normalized localization score as a 
function of the number of sensors. The highest localiza¬ 
tion score of 0.65 is achieved when 48 sensors are in¬ 
stalled. This result indicates that 110 unique vectors of 
sensors output are associated with the 168 failure events. 
Figure 6 b shows the worst, median, and minimum lo¬ 
calization set sizes as a function of the number of sen¬ 
sors for Netl. We observe that initially sizes of localiza¬ 
tion sets decrease rapidly with the number of sensors, 
until the worst localization-set-size reaches a plateau at 
20 sensors, and does not improve further. This implies 
that deploying more sensors might improve local per¬ 
formance, but will not improve the overall network lo¬ 
calization performance, making further deployment of 
sensor unattractive for the water utility from the cost 
viewpoint. 

6 Application to real networks 

We tested our approach on a batch of real water net¬ 
works. Principal information is listed in Table 1 and the 
complete data can be obtained from [15] for Nets 2-10 
and from [2] for Nets 1,11,12. In all our simulations we 
again assume, that a single failure can occur at each of 
the network links and that sensors can be placed at each 
of the network nodes, and set the distance threshold to 
£ = 1000 [ to ]. 

Table I 
Network data 


Network 

Length 

[fcm] 

Demand 

/ day] 

No. of 
pipes 

No. of 

nodes 

Netl 

37.56 

5.15 

168 

126 

Net2 

91.29 

7.59 

366 

269 

Nets 

96.58 

8.58 

496 

420 

Net4 

137.05 

5.78 

603 

481 

Nets 

123.20 

6.20 

644 

543 

Net6 

166.60 

5.66 

907 

791 

Net 7 

153.30 

8.93 

940 

778 

Nets 

152.25 

7.91 

1124 

811 

Net9 

260.24 

5.67 

1156 

959 

NetlO 

247.34 

9.33 

1614 

1325 

Netll 

760.89 

71.88 

3032 

1891 

Net 12 

1844.04 

108.8 

14822 

12523 



Fig. 7. Layout of Net9 and example of the detection and 
localization sets for three sensors 

6.1 MSCvs.MTC 

First, we compare the sensor placement design for the 
identification problem obtained from our approach with 
the design for the detection problem, i.e. MTC vs. MSC 
(Sections 2.2, 2.3). We demonstrate our results using 
Net9, from the Kentucky dataset. Although the system 
supplies similar daily demand as Netl, it is spatially 
more distributed with approximately 260 [km\ of pipes. 
Network layout and main features are shown in Figure 
7 and Table 1. 

Figure 7 schematically illustrates the difference be¬ 
tween the MTC and MSC problem formulations in 
the context of Net9. Consider three sensors installed 
in the network. Figure 7 demonstrates the seven lo¬ 
calization sets corresponding to seven unique sensor 
states, [ 0 , 0 , 1 ], • • • , [ 1 , 1 , 1 ] and the detection set, being 
the union of the localization sets. Whereas the detec¬ 
tion problem tries to maximize the detection set, the 
identification problem aims to identify distinct subsets. 

Figure 8 provides a comparison between the detection 
and localization scores for the MTC (blue circles) and 
MSC (red squares) designs. For the detection problem, 
25 sensors are sufficient to cover the entire system, hence, 
we also select the first 25 sensors for the identification 
problem and compare their performance. From Figure 
8 a it can be seen that the two designs overlap for the first 
7 sensors and the MSC design only slightly outperforms 
the MTC design when comparing the detection scores for 
a higher number of sensors. At the same time, the MTC 
design significantly outperforms the MSC design when 
comparing the localization scores as shown in Figure 8 b. 
Similar results were attained for the other networks. 

6.2 Augmented greedy vs. transformed lazy greedy 

Next, we compare the solution approach based on the 
augmented greedy (AG) (Section 4) and the transformed 
lazy greedy (TLG) (Section 3.2). Table 2 lists the run¬ 
ning times (Intel Core i7, 2.9 GHz, 16 GB of RAM) for 
the augmented greedy and the transformed lazy greedy 
approaches. For Nets 1-10, the new algorithm is 3 to 8 
times faster than the transformed lazy greedy approach. 
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Fig. 8. MTC versus MSC performance for Net9 

depending on the maximum number of events detected 
by any sensor (see Proposition 4.1). The solutions ob¬ 
tained using the two approaches were identical. For Nets 
11-12, we were not able to apply the TLG due to the 
memory requirements and applied only the AG, which 
further emphasizes the advantage of the AG approach. 

Finally, Table 2 lists the maximum number of sensors 
and the corresponding four performance scores: normal¬ 
ized detection Id, identification Ij, and localization 1 ^ 
scores, and worst localization set size Iw For all net¬ 
works, the layouts and the simulation plots illustrating 
these metrics as a function of the number of sensors 
are available in [27]. These results demonstrate that: (1) 
The number of sensors required solely for detection pur¬ 
pose is significantly lower than the number of sensors 
required for localization. (2) Between the two localiza¬ 
tion measures, II and Iw, the localization score is more 
conservative than the worst set size, requiring a larger 
number of sensors. For example, consider the design for 
Net9, then to detect 95% of the events, i.e.. Id = 0.95, 
18 sensors are sufficient, whereas to achieve = 0.5 we 
require 79 sensors, and 38 to achieve Iw = 20. This is 
observed for all tested networks. 


Table 2 

Simulation results 


Network 

No. of 

sensors 

Id 

h 

II 

Iw 

TLG 

[min] 

AG 

[min] 

Netl 

48 

0.99 

0.99 

0.65 

12 

0.23 

0.08 

Net 2 

98 

0.99 

1.00 

0.86 

12 

2.39 

0.58 

Nets 

134 

0.99 

1.00 

0.86 

7 

6.93 

1.65 

Net4 

138 

0.99 

1.00 

0.91 

8 

11.98 

4.93 

Nets 

164 

0.99 

1.00 

0.86 

6 

15.58 

3.85 

Nets 

258 

1.00 

1.00 

0.86 

8 

45.46 

6.31 

Net 7 

139 

1.00 

1.00 

0.83 

8 

49.12 

9.31 

Nets 

195 

1.00 

1.00 

0.70 

8 

80.55 

28.07 

Net9 

359 

1.00 

1.00 

0.87 

6 

91.57 

11.06 

NetlO 

408 

1.00 

1.00 

0.89 

14 

257.41 

39.48 

Netll 

717 

1.00 

1.00 

0.69 

9 

- 

50.53 

Net 12 

1000* 

1.00 

1.00 

0.38 

17 

- 

1800 


TLG - transformed lazy greedy; AG - augmented greedy; 
’terminated after 1000 iterations 


7 Related work 

Event detection in water networks. In the urban wa¬ 
ter sector, majority of previous works focused on the 


sensor placement for detecting hypothetical contamina¬ 
tion events assuming perfect sensors capable of detect¬ 
ing all types of contaminants [5,11]. In a related work 
[16], to detect the presence of contaminants in large wa¬ 
ter distribution systems, the notion of penalty reduc¬ 
tion function was introduced to realize various objec¬ 
tive functions such as reduction of detection time and 
the expected population affected. Submodularity of the 
penalty reduction function was then used to solve sensor 
placement problems efficiently and with provable guar¬ 
antees. Moreover, various data and model-driven tech¬ 
niques also exist that are applied for system’s state esti¬ 
mation and event detection and isolation [10,32]. The ba¬ 
sic premise in these methods is that once the sensors are 
in place, data is collected and transmitted in real-time. 
The difference between measurements, such as pressure 
[28] and flow [31], and their estimated values obtained 
using the network hydraulic model, is then computed. 
Model based leakage detection techniques are employed 
primarily on the operational side with the objective to 
efficiently utilize available measurements along with the 
available system model to determine the system faults. 

Our approach is somewhat related to [9,33], which con¬ 
sider pipe bursts as failure events. In [9], detection of 
events in networks is studied using distance decaying 
sensing function. The problem is formulated as a contin¬ 
uous p-median facility location problem and solved us¬ 
ing a gradient descent algorithm. However, in contrast 
to [9], in which only the detection problem is considered, 
we consider detection as well as location identification of 
link failures. In [33] both the detection and location iden¬ 
tification of failure events are considered in the problem 
formulation. 

In this work, we consider the placement of online high- 
rate pressure sensors. Additional surface and inline de¬ 
tection techniques include acoustic, umbilical, and au¬ 
tonomous robots. These tools are principally used to ver¬ 
ify and pinpoint the location of the burst, their operation 
is typically time consuming and expensive, and they are 
not suitable for continuous operation [39]. Ideally, flow 
meters can also be used for detecting and localizing leaks 
in water networks. However, these are more expensive 
and can be typically installed on main pipelines only at 
the inlets of sub-networks [23]. Furthermore most flow 
meters do not react instantaneously to changes in flow, 
hence are more suitable for persistent leaks [29]. 

Approximation algorithms. The sensor placement prob¬ 
lem is not unique to the water sector and can be found 
in many engineering applications. Sensor placement is in 
essence a combinatorial optimization problem, in which 
a minimum number of sensors are deployed to minimize 
the uncertainty about the events of interest. The domi¬ 
nant approach is to cast the sensor placement problem 
as the classical minimum set cover (MSG) problem, in 
which given a set of n elements and a collection of m sub¬ 
sets, the goal is to select as few subsets as possible such 
that their union covers all elements. The MSG problem 
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is known to be NP-hard [22]. The greedy algorithm guar¬ 
antees the best possible approximation ratio of (In n-l-1). 
A key feature in the efficient and practically feasible 
greedy algorithm is exploiting the submodular property, 
i.e. decreasing marginal utility of the objective function. 
Extensive literature exists on the greedy approximation 
for submodular functions. In [17], a mutual information 
criterion was proposed to select the most informative 
sensors to monitor a spatial phenomenon modeled by 
a Gaussian process. The submodularity property of the 
criterion, as shown in [24], was then exploited to obtain a 
polynomial time algorithm guaranteeing a constant fac¬ 
tor approximation of the optimal sensor set. 

Model-based diagnosis. Fault detection and identification 
(EDI) and consistency based diagnosis (DX) are two dis¬ 
tinct approaches which rely on computing sets of events 
in a faulty system based on the discrepancies between 
the observed and predicted system behavior [6[. In the 
EDI community fault diagnosis is captured by localiz¬ 
ing faults based on residuals that capture these faults. 
The problem is then to select a set of residual genera¬ 
tors that are sensitive to the set of faults [18,30,35]. In 
the DX community, the diagnosis is derived by comput¬ 
ing a set of conflicts that capture the faulty components 
that explain the observed failures [3,8,12]. To compute 
the minimum set of residual generators or the minimum 
set of conflicts, the problem often relies on the MSC or 
the minimum hitting set (MHS) formulation. The MSC 
problem is equivalent to the MHS, in which given the 
same input as in the MSC, the goal is to find the small¬ 
est subset of elements that hits (i.e. has a non empty 
intersection) every subset [6]. 

In previous works [18,30,35] the isolation solution is ob¬ 
tained by first computing the set of all pair-wise faults 
from a given set of faults, and then using greedy heuris¬ 
tics to solve the MSC or the MHS problems. This is sim¬ 
ilar to the TLG approach described in Section 3.2. Com¬ 
puting all pair-wise events is the main computational 
bottleneck, especially when applied to large scale net¬ 
works. The AG presented in Section 4 is a faster imple¬ 
mentation of the greedy approach for the solution of the 
MTC. Its main feature is avoiding the transformation of 
the MTC to the MSC/MHS, which makes it more suit¬ 
able for large-scale distributed systems, as demonstrated 
for Nets 11-12 in Table 2. 

8 Conclusions and future work 

In this work, we focused on the sensor placement for 
fault location identification in water networks. We cast 
the problem as the minimum test cover problem and sug¬ 
gested a fast solution approach. Additionally, we tested 
and analyzed the solutions using multiple performance 
criteria for a suite of real water networks. The outcomes 
of our approach could provide a better diagnosis of fail¬ 
ure events in terms of improved localization and response 
to failure events in operational mode, and could signif¬ 
icantly reduce potential physical losses and service dis¬ 


ruptions in water networks. In this work we assumed per¬ 
fect sensing information, future extension will include 
sensor placement robust to erroneous and corrupt data. 

Nomenclature 

C* set of pair-wise link failures detected by the sen¬ 
sor i 

Ci set of link failures detected by the sensor i 

C collection of all CtS 

C* collection of all C^’s 

fo detection function 

fj identification function 

h hydraulic head 

Id normalized detection score 

1 1 normalized identification score 

II normalized localization score 

Iw number of elements in the largest localization set 
k maximum number of link failures detected by 
any sensor 

£j (failure) event 

£lj unordered pair of (failure) events £i and £j 
C set of all (failure) events 

C* set of all pair-wise (failure) events 
L localization set 

m total number of sensors 

M. influence matrix 

Al* transformed influence matrix 

n total number of events 

p pressure 

q flow 

Si the location of the sensor 

S set of all sensors 

y 5 outputs of sensors in the set S 
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9 Transient modeling 


two comparability equations are formulated as: 


Unsteady state flow in a closed conduit can be described 
by mass and momentum equations formulated as [6]: 


^ - 9+) + (^» - ^+) + 2gj)^2 g+lg+l = 0 


dh a? dq 
dt gA dx 


( 10 ) 


1 dq dh cq\q\ 
gA dt~^ dx~^ 2gDA‘^ 


( 11 ) 


where h is the hydraulic head [to], q is the volumetric 
flow rate [^], g is the gravitational acceleration [j^], x 
is distance along the pipe [to], t is the time [sec], a is the 
wave speed in the conduit [^], c is a friction factor, D is 
the pipe diameter [to], and A is the pipe cross sectional 
area [to^]. 


The method of characteristics (MOC) is one of the 
most common numerical techniques used to approxi¬ 
mate the solution of the hydraulic transient. Additional 
techniques used are finite differences and node charac¬ 
teristic method. A detailed derivation of the governing 
equations and the solution scheme can be found in [4,6]. 
The MOC transforms partial differential equations into 
ordinary differential equations that apply along specific 
lines (characteristics), C'^ and C~, in the space-time, 
x-t, plane. Two characteristic equations are solved ex¬ 
plicitly to compute the head and flow, at new 

point in time and space, (•)*, given that the conditions 
at a previous time step along the characteristic grid are 
known, i.e., h+,q+ and h-,q-. For a given pipe, the 
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(12a) 

CL , , , N cAx , , 




(12b) 

Rearranging equations (12a) and (12b) we get: 



: h„ = Cp - hq„ 

(13a) 


C I = Cm ~\~ 

(13b) 

where 

C+ : Cp = h+ + q+ib-r\q+\) 

(14a) 


C- : CM = h_-q_ib-r\q_\) 

(14b) 

and 

b=^ 

gA 

(15) 


cAx 

^ ~ 2gDA^ 

(16) 


& is a function of the physical characteristics of the pipe 
and the wave speed of the fluid in the conduit. The pa¬ 
rameter b can be viewed as the characteristic impedance, 
which is associated with the transient state, r is a func¬ 
tion of the physical characteristics of the pipe, that can 
be viewed as pipe’s resistance coefficient, and is associ¬ 
ated with the steady state. If 6 = 0 the set of equations 
(13) is reduced to the steady state equations, where the 
head losses along the pipe occur only due to friction. 


We designate the points (•)+, (•)-> (’)» ^ space-time 

grid of characteristics. If i and t are indices for space and 
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time, respectively, then: (•)* 

(hi-i^t, qi-i,t), (•)- -t (hi+i,t, gi+i.t). Then solving first 
for hi t+i, by eliminating g* in (13), for a single node in 
the numerical grid, we get: 


hid+i — 2 

\hi-id + hipid + b (qi-id — qi+id) 

+ r (qi+id\qi+i,t \ - qi-id\qi-i,t\)] 

(17) 

1 

qt,t+i = ^ 

[hid+l — hi+id + qi+l,t — ?’|9i+l.i|] 

(18) 


where r is the resistance coefficient, which is associated 
with the steady state, and b is the impedance coefficient, 
which is associated with the transient state. If & = 0 
the set of equations (17),(18) is reduced to the steady 
state, where the head loss along a pipe occurs only due 
to friction [5]. 

At the boundaries specific conditions need to be defined 
describing the head-flow relation. Common boundary 
conditions, such as cross-connections and control valves, 
can be found in [6]. We give an example for boundary 
condition for pipe burst at location i using the orifice 
head-flow equation: 

hi^t+i + -;^CdAd^t+i\/‘^ghi^t+i -^-=0 (19) 

where Cd is the orifice discharge coefficient. Ad is 
the cross-section area of the orifice, Cp = hi-i^ + 
qi-id {b - r\qi_id\), Cm = - qi+id {b - r|gi+qt|). 

Before the burst occurs the coefficient Ad is equal to 
zero and Equation(19) reduces to Equation (17). During 
a burst Ad is positive, hence we can expect a change in 
the hydraulic head. The relationship between the head 
and the pressure measured by sensors at location i is 
relative to the elevation of location i, denoted by Zi, i.e., 
at any given time, pi^ = (hid ~ ^i) P9- Hence, we can 
expect to detect the pipe burst by observing the differ¬ 
ences between the expected and the measured pressures 
at a given time and location in the network. Similar 
approaches have been previously suggested in [7]. 

Figure 9 shows a raw pressure signal recorded by Visenti 
[2] online sensor during a pipe burst event with 250[Hz] 
sampling frequency. Figure 9 shows the dynamic na¬ 
ture of pressure, a sharp drop in the pressure during a 
pipe burst event, and a rapid return to normal operating 
range. The duration of drop in pressures is just under a 
few seconds, hence cannot be detected using a more tra¬ 
ditional methods such as supervisory control and data 
acquisition (SCADA) systems, which typically operate 
on minutes scales. 

10 Submodularity 

Lemma 10.1 The detection function fp (as defined in 
the Equation (6) of the main paper) is submodular. 



Fig. 9. Pressure signal during a burst event recorded from 
online sensor installed in a water system 

Proof - Let Cs C C C, and Ci € C\Cr, then we need 
to show 

fo (Cs U {CJ) - fD(Cs) > fD (Cr U {CJ) - fD(Cr) 
Assume that C' = Ci\ (J Cj, then 

fD(Cs u {CJ) = fD(Cs U {C(}) = fD(Cs) + fD({C(}) 

( 20 ) 

Moreover, let A = | (J Cfc ) \ ( U CJj\JC'A, and 

\Cfcec, / \Cj&Cs J 

T= U C'fc n C', then 

Ck&Cr- 

fD(Cri9m) = fD(CMC'i}) = fD(CsU{Cl}) + fD({X}), 

( 21 ) 

and 

fD(Cr) = fD(Cs) + fD({X}) + fD({p}). (22) 

Substituting (22) into (21) gives, 

fD(CsU{C,})-fD(Cs)-fD({p}) = fD(CMC,})-fD(Cr) 
The required result follows directly. □ 

11 Augmented greedy — Example 3 (cont.) 

In each iteration, for every sensor i not in the test cover, 
Ci is decomposed into two sets namely, Xi = Ci\ Ccov 
and Yi = Ci D Ccov The utility of including a sensor 
in the test cover is calculated in terms of Xi and pi. Xi 
computes the number of pair-wise link failures detected 
by Ci corresponding to the links not in Ccov, whereas 
Pi computes the undetected pair-wise link failures cor¬ 
responding to the links in Ccov that can be detected by 
Ci- Then, a sensor that maximizes the utility is selected 
and Ccov, which is the set of covered (detected) events, 
and Gu, which is the set of undetected pair-wise events 
corresponding to the events detected by the sensor u al¬ 
ready included in the test cover, are updated. We give 
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detailed steps of the algorithm using the illustrative ex¬ 
ample in the paper (Figure 10). 



Fig. 10. Illustrative example layout 
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First iteration of the while loop, j = 1. We denote 
the total number of events detected by sensor i as 
ki, i,.e., ki = \Ci\. Similarly, kij denotes the number 
of undetected events that are detected by the sensor 
i in the iteration, i.e., kij = \Ci \ Ccov\- In the 
first iteration kij = ki, Vh In this example, the set 
of all ki^s is {5, 5, 7, 9, 7, 6 , 7, 6 }.Then, for each sen¬ 
sor i, we compute the number of new pair-wise events 
detected, Xi = ki^i{n — ki^i). For instance, for sensor 
1, xi = 5(10 — 5) = 25. Next, we need to compute yt 
for all i. Since there is no sensor in the test cover in 
the first iteration, j/i = 0 for all i. The total utility of 
selecting a sensor is equal to Wi = Xi + yi. The max¬ 
imum Wi- is attained for sensors 1 and 2. We select 
sensor 1 to be included in the test cover, and update 
C* {Cl*}, and Gi, which is the set of all unde¬ 
tected pair-wise events corresponding to the events in 
Xi = Ci\Ccov. Here, Gi = {{1,2}, {1,3}, • • • , {4,5}}. 
Finally, we update the set of covered (detected) events 
Ccov Ccoj; U Gi = Cl = {1,2,3,4,5}. 

Second iteration of the while loop, j = 2. At the 
beginning of second iteration, the event space has 
been reduced from 10 to 5, i.e., n 2 = 5. For each 
sensor i, we first compute Xi, which is the set of 
undetected events (events that are not in Ccov) that 
are detected by the sensor i, i.e., Xi ^ {Ci\Ccov)- 
Then, we compute Xi = ki^ 2 {n 2 — ^ 1 , 2 ), where ki ^2 is 
\Xi\. For instance, for sensor 2, C 2 = {1,2, 3 , 6 , 8 }, 
then X 2 •(— (G 2 \ Ccov) = { 6 ; 8 } and ^ 2,2 = 2. Then 
X 2 = 2(5 — 2) = 6 . Next, for each sensor i, we com¬ 
pute yi, which is the number of pair-wise events in 
Gi that are detected by the sensor i. For instance, in 


the case of sensor 2, six of the pair-wise events in Gi, 
given by {{1,4}, {1,5}, {2,4}, {2, 5}, {3,4}, {3, 5}}, 
are detected by the sensor 2. Thus, we get 2/2 = 6 - The 
values of yi for all i are given in Table 1. After this, 
the utility of each sensor is computed as Wi = Xi + yi. 
For sensor 2, the value of W 2 is 12, which turns out to 
be the maximum among all the sensors in the second 
iteration. Thus, sensor 2 is included in the test cover. 
We update C* ^C*U{G 2 .},Cco« = ( 1 , 2 ,3,4,5, 6 , 8 }, 
and 

Gi^Gi\{{l,4},{l,5},{2,4},{2,5},{3,4},{3,5}} 
= {{1,2},{1,3},{2,3},{4,5}}. 

At the same time, a new set G 2 is created, which con¬ 
tains the set of pair-wise events in A 2 . Since X 2 = 
( 6 , 8 }, we get G 2 = {{ 6 , 8 }}. 

Next iteration. We continue with the same steps until 
no improvement can be made, i.e. Wi = 0 for each 
sensor. At the end of the algorithm, sensors in the set 
(1, 2,3, 5} are included in the test cover. 

For this example, a complete account of the values of 
variables in each iteration of the algorithm is given in 
Table 1. 

12 Evaluation on real networks (cont.) 

For all networks [1,3], the layouts and the simulation 
plots illustrating the four performance metrics are shown 
in Table 4. For the ease of presentation, the worst local¬ 
ization set size, Iw^ is normalized by dividing it by the 
number of pipes. 
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Table 3 

Illustrative example demonstrating the steps in the augmented greedy solution of the MTC problem 



i = 1 

i = 2 

j = 3 

j = 4 

j = 5 

Ccov 

0 

{1,2,3,4,5} 

{1,2,3,4, 5,6,8} 

{1,2,-•• ,9} 

{1,2,-•• ,10} 

rij 

10 

5 

3 

1 

0 

XuYi 

{1,2,3,4,5}, 0 

- 

- 

- 

- 

X2,Y2 

{1,2,3,6,8}, 0 

{6,8},{1,2,3} 

- 

- 

- 

X3,Y3 

{1,2,4,5,6,7,91, 0 

{6, 7, 9}, {1,2, 4, 5} 

{7, 9}, {1,2, 4, 5,6} 

- 

- 

X4,Y4 

{2,3,--- ,10}, 0 

{6,-- - ,10}, {2, 3, 4, 5} 

{7,9,10}, {2,-•• ,6,8} 

{10},{2,3,--- ,9} 

0, {2,3, ••• ,10} 

Xs.Ys 

{1,3,4,6,7,8,10}, 0 

{6, 7, 8,10}, {1,3, 4} 

{7,10}, {1,3,4, 6,8} 

{10},{1,3, 4, 6, 7,8} 

- 

Xe,Ye 

{2, 4, 5, 7, 9,10}, 0 

{7,9,10}, {2,4, 5} 

{7,9,10}, {2,4, 5} 

{10}, {2,4, 5, 7, 9} 

0, {2,4,5,7,9,10} 

X7,Yr 

{4,5, ■■■ ,10}, 0 

{6,-- - ,10}, {4, 5} 

{7,9,10}, {4, 5, 6,8} 

{10},{4,5,--- ,9} 

0, {4,5,-■■ ,10} 

Xs.Ys 

{3,6, ■■■ ,10}, 0 

{6,-- - ,10}, {3} 

{7,9,10}, {3, 6, 8} 

{10}, {3, 6, 7,8,9} 

0, {3,6, ■■■ ,10} 

xi, yi 

25,0* 

- 

- 

- 

- 

X2, y2 

25,0 

6,6* 

- 

- 

- 

X3, yz 

21,0 

6,4 

2,3* 

- 

- 

X4, y4 

9,0 

0,4 

0,2 

0,1 

0,0 

yz 

21,0 

4,6 

2,3 

0,3* 

- 

X6, ye 

24,0 

6,6 

0,2 

0,1 

0,0 

y7 

21,0 

0,6 

0,0 

0,0 

0,0 

00 

00 

24,0 

0,4 

0,2 

0,0 

0,0 

Go 

0 

0 

0 

0 


Gi 

r {1,2},{1,3},{1,4}, 1 

1 {1,5},{2,3},{2,4}, 1 
] {2, 5}, {3,4},{3,5}, f 

U4,5} J 

r{l,2},{l,3},{2,3},} 
l{4,5} J 

{{1,2}, {4, 5}} 

0 

0 

G2 

- 

{{6,8}} 

0 

0 

0 

Gs 

- 

- 

{{7,9}} 

0 

0 

Gi 

- 

- 

- 

{10} ^ 0 

0 


* is the selected sensor with the maximum utility, i.e. Wi* ^ maxif;i 
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